Converting Unicode Lexicon and Lexical Tools for ASCII NLP Applications
نویسندگان
چکیده
The NLP SPECIALIST Lexicon and Lexical Tools, distributed by National Library of Medicine (NLM), have been released in Unicode (UTF-8) format since 2006. Lexicon is used as corpus while Lexical Tools are used as software packages in NLP (Natural Language Processing) projects. Some NLP projects still only deal with ASCII (7-bit) characters. This paper describes how to convert UTF-8 Lexicon and integrate Lexical Tools to a pure ASCII NLP project, MetaMap.
منابع مشابه
Using Lexical tools to convert Unicode characters to ASCII.
Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in most of the worlds writing systems. It is widely used in multilingual NLP (natural language processing) projects. On the other hand, there are some NLP projects still only dealing with ASCII characters. This paper describes methods of utilizing lexical tools to convert Unicode character...
متن کاملAutomatic Construction and Validation of French Large Lexical Resources: Reuse of Verb Theoretical Linguistic Descriptions
We address in this paper some problems related to the reuse for NLP of LADL’s Lexicon-Grammar (LG). This major source of French verbs lexical knowledge has been publicly available on the Internet for several years. However, it has not been used by the NLP community, mainly because of its format: ASCII files each of them containing a table with binary values (+/ ). The interpretation of these ta...
متن کاملImplementing Comprehensive Derivational Features in Lexical Tools Using a Systematical Approach
A systematic approach for automatically generating derivational variants based on the SPECIALIST Lexicon was proposed and implemented in Lexical Tools [1]. This approach addressed the prefix (PD), zero (ZD), and suffix (SD) derivations from nominalizations (nomD). This paper describes the generation of SD (not from nomD) based on the Lexicon in the Lexical Tools, including both SD-Facts and SD-...
متن کاملUsing Element Words to Generate (Multi)words for the SPECIALIST Lexicon
The SPECIALIST Lexicon has been distributed annually by the National Library of Medicine (NLM) since 1994. Lexical records are used for Part-of-Speech (POS) tagging, indexing, information retrieval, concept mapping, etc. in many Natural Language Processing (NLP) projects, such as Lexical Tools, MetaMap, SemRep, UMLS Metathesaurus, and ClinicalTrials.gov. This paper describes a new systematic ap...
متن کامل03. Tools and Procedures for the Acquisition of Morphological and Syntactic Information from Corpora
Over the past decades, the importance of the lexicon has increased in both natural language processing (NLP) and linguistic theory. Within NLP, much of the early research focused on isolated ‘toy’ tasks, treating the lexicon as a peripheral component. These days, the focus is on constructing systems suitable for the treatment of large, naturally occurring texts, and therefore rich lexical resou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011